# High-precision OCR

En PP OCRv4 Mobile Rec
Apache-2.0
An ultra-lightweight English text line recognition model developed by the PaddleOCR team, supporting the recognition of English and numeric characters
Text Recognition Supports Multiple Languages
E
PaddlePaddle
303
0
Slanext Wired
Apache-2.0
SLANeXt_wired is a deep learning model for table structure recognition, which can convert non - editable table images into editable table formats (such as HTML).
Text Recognition Supports Multiple Languages
S
PaddlePaddle
1,141
0
PP OCRv5 Server Det
Apache-2.0
PP-OCRv5_server_det is the latest generation of text detection model developed by the PaddleOCR team. It is designed for high-performance application scenarios and supports the detection of text in various scenarios, including handwritten, vertical, rotated, and curved text. It can recognize multiple languages.
Text Recognition Supports Multiple Languages
P
PaddlePaddle
8,722
2
Llama 3.1 Nemotron Nano VL 8B V1
Other
Llama-3.1-Nemotron-Nano-VL-8B-V1 is an advanced document intelligent vision-language model that can query and summarize images and videos, and supports multi-environment deployment.
Image-to-Text Transformers
L
nvidia
1,092
66
Sapnous VR 6B
Apache-2.0
Sapnous-6B is an advanced vision-language model that enhances perception and understanding of the world through powerful multimodal capabilities.
Image-to-Text Transformers English
S
Sapnous-AI
261
5
Aya Vision 32b
Aya Vision 32B is an open-weight 32B parameter multimodal model developed by Cohere Labs, supporting vision-language tasks in 23 languages.
Image-to-Text Transformers Supports Multiple Languages
A
CohereLabs
387
193
Typhoon2 Qwen2vl 7b Vision Instruct
Apache-2.0
Typhoon2-Vision is a Thai-supported visual language model capable of processing image and video inputs, specifically optimized for image-based applications.
Text-to-Image Transformers Supports Multiple Languages
T
scb10x
793
11
Paligemma2 3b Mix 224
PaliGemma 2 is an upgraded vision-language model developed by Google, combining the capabilities of Gemma 2, supporting image and text inputs to generate text outputs, suitable for various vision-language tasks.
Image-to-Text Transformers
P
google
15.23k
28
TF ID Base
MIT
TF-ID is a series of object detection models specifically designed to extract tables and figures along with their caption texts from academic papers.
Image-to-Text Transformers
T
yifeihu
408
36
TF ID Large
MIT
TF-ID is a visual object detection model specifically designed for extracting tables and charts from academic papers, fine-tuned based on Florence-2
Object Detection Transformers
T
yifeihu
9,893
21
Pix2text Mfr Quantized
MIT
Pix2Text's Mathematical Formula Recognition (MFR) model, trained based on the TrOCR architecture, can convert mathematical formula images into LaTeX text representations.
Text Recognition Transformers
P
Brian314
37
0
Pix2text Mfd
MIT
Pix2Text's Mathematical Formula Detection (MFD) model for recognizing mathematical formulas in images
Text Recognition Other
P
breezedeus
1,369
3
Extract Matic
MIT
Sparrow is a document data extraction model fine-tuned on invoice data based on the Donut ML foundation model, designed to validate Donut's performance on enterprise documents.
Image-to-Text Transformers English
E
ssraut
17
0
Extract Matic
MIT
Sparrow is a document data extraction tool fine-tuned on invoice data based on the Donut ML foundation model, designed to validate Donut's performance on enterprise documents.
Image-to-Text Transformers English
E
PCS
17
0
Final Model
Apache-2.0
This model is an image-to-text model based on the Apache-2.0 license, capable of converting image content into textual descriptions.
Text Recognition Transformers
F
goatrider
17
0
Output LayoutLMv3 V7
A document understanding model fine-tuned based on microsoft/layoutlmv3-base, excelling in document layout analysis tasks
Text Recognition Transformers
O
Noureddinesa
18
1
Minicpm V 2
MiniCPM-V 2.0 is a powerful multimodal large language model designed for efficient terminal deployment, built upon SigLip-400M and MiniCPM-2.4B and connected via a perceptual resampler.
Text-to-Image Transformers Supports Multiple Languages
M
openbmb
9,097
461
Trocr Base Plate Number
Apache-2.0
A vision model for recognizing vehicle license plates, capable of extracting plate numbers from images.
Text Recognition Transformers
T
ristek-dsa
29
0
Pix2text Mfr
MIT
Pix2Text's Mathematical Formula Recognition (MFR) model, trained based on the TrOCR architecture, capable of converting mathematical formula images into LaTeX text representations.
Text Recognition Transformers
P
breezedeus
5,753
35
Trocr Base Printed License Plates Ocr Timestamp
An OCR model fine-tuned based on microsoft/trocr-base-printed, specifically designed for recognizing license plates and timestamp information
Text Recognition Transformers
T
PQAshwin
132
1
Nougat For Formula
Apache-2.0
A fine-tuned mathematical formula recognition model based on Nougat-small, excelling in extracting LaTeX formula code from images
Image-to-Text Transformers
N
CuiSiwei
40
5
Donut Demo
MIT
CORD-v2 is a model for image-to-text tasks, primarily used for extracting and recognizing text content from images.
Text Recognition Transformers
D
zhongren2
20
0
Nougat
This model is outdated. It is recommended to use the official Nougat model. Nougat is an advanced vision-language model focused on document understanding and analysis.
Image-to-Text Transformers
N
nielsr
14
4
Trocr MICR
An OCR model specifically designed for transcribing e13b MICR codes, fine-tuned based on Microsoft's TrOCR-large-stage1.
Text Recognition Transformers English
T
Apocalypse-19
94
1
Pix2struct Tiny Random
MIT
This is an image-to-text model based on the MIT license, capable of converting image content into descriptive text.
Image-to-Text Transformers
P
fxmarty
60.87k
2
General Image Captioning
Apache-2.0
This is an image-to-text model based on the Apache-2.0 license, capable of converting image content into textual descriptions.
Text Recognition Transformers Other
G
alibidaran
30
0
Thesisdonut
MIT
A model fine-tuned based on naver-clova-ix/donut-base, specific uses and functions require more information
Image-to-Text Transformers
T
Humayoun
13
0
Layoutlmv3 Finetuned DocLayNet
A document layout analysis model fine-tuned based on the LayoutLMv3 architecture, specifically designed for document element classification tasks in the DocLayNet dataset.
Text Recognition Transformers English
L
Mit1208
226
1
Invoices Donut Model V1
MIT
Sparrow is a document data extraction model fine-tuned on invoice data based on the Donut ML foundation model, aimed at validating Donut's performance on enterprise documents.
Image-to-Text Transformers English
I
katanaml-org
216
38
Mscoco Finetuned CoCa ViT L 14 Laion2b S13b B90k
MIT
This is an image-to-text model based on the MIT license, capable of converting image content into textual descriptions.
Image-to-Text
M
laion
21.02k
20
Donut Demo
MIT
This is a Donut model fine-tuned on the CORD-v2 dataset, designed for image-to-text tasks, achieving an average accuracy of 0.901.
Image-to-Text Transformers
D
katanaml
24
3
Layoutlmv3 Finetuned Funsd
A document understanding model fine-tuned on the nielsr/funsd-layoutlmv3 dataset based on microsoft/layoutlmv3-base
Text Recognition Transformers
L
Narsil
799
0
Dof Passport 1
MIT
A model fine-tuned based on naver-clova-ix/donut-base, specific purpose not explicitly stated
Image-to-Text Transformers
D
Sebabrata
16
0
OCR LayoutLMv3 Invoice
An invoice recognition model fine-tuned based on LayoutLMv3-base, trained on the wild_receipt dataset, excelling in extracting structured information from invoices.
Sequence Labeling Transformers
O
jinhybr
340
8
Trocr Large Str
TrOCR is a Transformer-based optical character recognition model designed for single-line text images, fine-tuned on multiple standard datasets.
Text Recognition Transformers
T
microsoft
571
17
Layoutlmv3 Finetuned Invoice
A fine-tuned invoice information extraction model based on LayoutLMv3-base on the SROIE dataset, excelling in token classification tasks
Text Recognition Transformers
L
oussama
52
5
Layoutlmv3 Finetuned Wildreceipt
A version fine-tuned on the WildReceipt dataset based on the LayoutLMv3-base model, designed for receipt key information extraction tasks
Text Recognition Transformers
L
Theivaprakasham
118
3
Layoutlmv3 Finetuned Invoice
An invoice information extraction model fine-tuned based on the LayoutLMv3 architecture, demonstrating outstanding performance on the SROIE dataset
Text Recognition Transformers
L
ronak1998
71
3
Layoutlmv3 Finetuned Sroie
A document understanding model fine-tuned on the SROIE dataset based on Microsoft's LayoutLMv3-base model, excelling in extracting structured information from scanned documents
Text Recognition Transformers
L
Theivaprakasham
409
0
Layoutlmv3 Finetuned Invoice
A version of LayoutLMv3-base fine-tuned on an invoice dataset for invoice information extraction
Text Recognition Transformers
L
Theivaprakasham
896
20
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase